Group 18: FINDING GENE PATTERNS IN BREAST CANCER DATA

Florencia De Lillo s242869
Maria Kokali s232486
Nikolas Alexander Mumm s242825
Rodrigo Gallegos Dextre s243563
Émile Knut Barbé s242826

Introduction:

Most common cancer in women worldwide

1 in 8 diagnosed

Many subtypes require a large collection of data

Question: Which genes are differentially expressed in different subtypes of cancer?

General workflow

General wokflow

EXPLORATORY ANALYSIS AND TIDY:

Cleaning procedure

EXPLORATORY ANALYSIS AND TIDY:

Ratio between male and female

Age of female patients stratified by cancer status

Country of origin of female patients

Hitological type of patient samples

DESEQ Analysis:

DESEQ workflow

DESEQ ANALYSIS:

Gene ENSG00000206585

Gene ENSG00000206652

Enriched pathways

Volcano plot

PCA Analysis:

Here is an analysis of PCA plots showing the scree and cumulative variance explained.

Explained variance

Cumulative variance

The high dimentionality required to explain 85% of the variability of the data shows that cancer analysis is a difficult task.

PCA Analysis:

Different PCs are influenced by distinct sets of genes

Overlapped clustering of tumor statuses
  • The highlighted genes for each PC might be linked to specific biological pathways or processes, as they represent the main drivers of variance for the data.
  • The PCA shows significant overlap between “TUMOR FREE” and “WITH TUMOR”, indicating no clear separation of cancer statuses.

Discussion: Biological insights

  • We can see that the DE genes in the data affect most importantly X pathways
  • This makes/not makes sense with the literature as 1,2,3

Conclusion: